6 research outputs found
Efficient Methods for the Design and Training of Neural Networks
The field of artificial intelligence has seen significant advancements with the development of neural networks, which have numerous applications in computer vision, natural language processing, and speech processing. Despite these advancements, designing and training these networks still pose numerous challenges. This thesis aims to address two critical aspects of neural network development, design and training, within the context of computer vision tasks.
The thesis focuses on three main challenges in the development of neural networks. The first challenge is finding an efficient way to perform architecture search in an extremely large or even unlimited search space. To address this challenge, the thesis proposes a Neural Search-space Evolution (NSE) scheme that enables efficient and effective architecture search in large-scale search spaces. The second challenge is to improve the efficiency of self-supervised learning for model pretraining. To address this challenge, the thesis proposes a combinatorial patches approach that significantly improves the efficiency of self-supervised learning. The third challenge is to develop an efficient and versatile multitask model that can leverage the benefits of large-scale multitask training. To address this challenge, the thesis proposes a Unified model for Human-Centric Perceptions (UniHCP) as a simple and scalable solution for a human-centric perception system that unifies multiple human-centric tasks into a neat, efficient, and scalable model.
The results of this thesis demonstrate the effectiveness of the proposed methods in improving the practicality and performance of neural network design and training. The NSE scheme, combinatorial patches approach, and UniHCP have been tested on a broad range of datasets, tasks, and settings, yielding impressive results. These findings affirm the efficacy of the proposed methods in enhancing the efficiency of the design and training process of neural networks
Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches
Contrastive-based self-supervised learning methods achieved great success in
recent years. However, self-supervision requires extremely long training epochs
(e.g., 800 epochs for MoCo v3) to achieve promising results, which is
unacceptable for the general academic community and hinders the development of
this topic. This work revisits the momentum-based contrastive learning
frameworks and identifies the inefficiency in which two augmented views
generate only one positive pair. We propose Fast-MoCo - a novel framework that
utilizes combinatorial patches to construct multiple positive pairs from two
augmented views, which provides abundant supervision signals that bring
significant acceleration with neglectable extra computational cost. Fast-MoCo
trained with 100 epochs achieves 73.5% linear evaluation accuracy, similar to
MoCo v3 (ResNet-50 backbone) trained with 800 epochs. Extra training (200
epochs) further improves the result to 75.1%, which is on par with
state-of-the-art methods. Experiments on several downstream tasks also confirm
the effectiveness of Fast-MoCo.Comment: Accepted for publication at the 2022 European Conference on Computer
Vision (ECCV 2022
UniHCP: A Unified Model for Human-Centric Perceptions
Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian
detection, person re-identification, etc.) play a key role in industrial
applications of visual models. While specific human-centric tasks have their
own relevant semantic aspect to focus on, they also share the same underlying
semantic structure of the human body. However, few works have attempted to
exploit such homogeneity and design a general-propose model for human-centric
tasks. In this work, we revisit a broad range of human-centric tasks and unify
them in a minimalist manner. We propose UniHCP, a Unified Model for
Human-Centric Perceptions, which unifies a wide range of human-centric tasks in
a simplified end-to-end manner with the plain vision transformer architecture.
With large-scale joint training on 33 human-centric datasets, UniHCP can
outperform strong baselines on several in-domain and downstream tasks by direct
evaluation. When adapted to a specific task, UniHCP achieves new SOTAs on a
wide range of human-centric tasks, e.g., 69.8 mIoU on CIHP for human parsing,
86.18 mA on PA-100K for attribute prediction, 90.3 mAP on Market1501 for ReID,
and 85.8 JI on CrowdHuman for pedestrian detection, performing better than
specialized models tailored for each task.Comment: Accepted for publication at the IEEE/CVF Conference on Computer
Vision and Pattern Recognition 2023 (CVPR 2023
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
Human-centric perceptions include a variety of vision tasks, which have
widespread industrial applications, including surveillance, autonomous driving,
and the metaverse. It is desirable to have a general pretrain model for
versatile human-centric downstream tasks. This paper forges ahead along this
path from the aspects of both benchmark and pretraining methods. Specifically,
we propose a \textbf{HumanBench} based on existing datasets to comprehensively
evaluate on the common ground the generalization abilities of different
pretraining methods on 19 datasets from 6 diverse downstream tasks, including
person ReID, pose estimation, human parsing, pedestrian attribute recognition,
pedestrian detection, and crowd counting. To learn both coarse-grained and
fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector
\textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method
(\textbf{PATH}) to learn diverse knowledge at different granularity levels.
Comprehensive evaluations on HumanBench show that our PATH achieves new
state-of-the-art results on 17 downstream datasets and on-par results on the
other 2 datasets. The code will be publicly at
\href{https://github.com/OpenGVLab/HumanBench}{https://github.com/OpenGVLab/HumanBench}.Comment: Accepted to CVPR202